2 research outputs found

    Effective visualisation of callgraphs for optimisation of parallel programs: a design study

    Get PDF
    Parallel programs are increasingly used to perform scientific calculations on supercomputers. Optimising parallel applications to scale well, and ensuring maximum parallelisation, is a challenging task. The performance of parallel programs is affected by a range of factors, such as limited network bandwidth, parallel algorithms, memory latency and the speed of the processors. The term “performance bottlenecks” refers to obstacles that cause slow execution of the parallel programs. Visualisation tools are used to identify performance bottlenecks of parallel applications in an attempt to optimize the execution of the programs and fully utilise the available computational resources. TAU (Tuning and Analysis Utilities) callgraph visualisation is one such tool commonly used to analyse the performance of parallel programs. The callgraph visualisation shows the relationship between different parts (for example, routines, subroutines, modules and functions) of the parallel program executed during the run. TAU’s callgraph tool has limitations: it does not have the ability to effectively display large performance data (metrics) generated during the execution of the parallel program, and the relationship between different parts of the program executed during the run can be hard to see. The aim of this work is to design an effective callgraph visualisation that enables users to efficiently identify performance bottlenecks incurred during the execution of a parallel program. This design study employs a user-centred iterative methodology to develop a new callgraph visualisation, involving expert users in the three developmental stages of the system: these design stages develop prototypes of increasing fidelity, from a paper prototype to high fidelity interactive prototypes in the final design. The paper-based prototype of a new callgraph visualisation was evaluated by a single expert from the University of Oregon’s Performance Research Lab, which developed the original callgraph visualisation tool. This expert is a computer scientist who holds doctoral degree in computer and information science from University of Oregon and is the head of the University of Oregon’s Performance Research Lab. The interactive prototype (first high fidelity design) was evaluated against the original TAU callgraph system by a team of expert users, comprising doctoral graduates and undergraduate computer scientists from the University of Tennessee, United States of America (USA). The final complete prototype (second high fidelity design) of the callgraph visualisation was developed with the D3.js JavaScript library and evaluated by users (doctoral graduates and undergraduate computer science students) from the University of Tennessee, USA. Most of these users have between 3 and 20 years of experience in High Performance Computing (HPC). On the other hand, an expert has more than 20 years of experience in development of visualisation tools used to analyse the performance of parallel programs. The expert and users were chosen to test new callgraphs against original callgraphs because they have experience in analysing, debugging, parallelising, optimising and developing parallel programs. After evaluations, the final visualisation design of the callgraphs was found to be effective, interactive, informative and easy-to-use. It is anticipated that the final design of the callgraph visualisation will help parallel computing users to effectively identify performance bottlenecks within parallel programs, and enable full utilisation of computational resources within a supercomputer

    Scalability of DL_POLY on High Performance Computing Platform

    No full text
    This paper presents a case study on the scalability of several versions of the molecular dynamics code (DL_POLY) performed on South Africa‘s Centre for High Performance Computing e1350 IBM Linux cluster, Sun system and Lengau supercomputers. Within this study different problem sizes were designed and the same chosen systems were employed in order to test the performance of DL_POLY using weak and strong scalability. It was found that the speed-up results for the small systems were better than large systems on both Ethernet and Infiniband network. However, simulations of large systems in DL_POLY performed well using Infiniband network on Lengau cluster as compared to e1350 and Sun supercomputer
    corecore